swap property
- North America > Canada (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Health & Medicine > Therapeutic Area > Immunology (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- North America > United States > Illinois (0.04)
- Asia > Middle East > Oman (0.04)
Normalizing Flows for Knockoff-free Controlled Feature Selection
Controlled feature selection aims to discover the features a response depends on while limiting the false discovery rate (FDR) to a predefined level. Recently, multiple deep-learning-based methods have been proposed to perform controlled feature selection through the Model-X knockoff framework. We demonstrate, however, that these methods often fail to control the FDR for two reasons. First, these methods often learn inaccurate models of features. Second, the swap property, which is required for knockoffs to be valid, is often not well enforced.
Deep Direct Likelihood Knockoffs
Predictive modeling often uses black box machine learning methods, such as deep neural networks, to achieve state-of-the-art performance. In scientific domains, the scientist often wishes to discover which features are actually important for making the predictions. These discoveries may lead to costly follow-up experiments and as such it is important that the error rate on discoveries is not too high. Model-X knockoffs enable important features to be discovered with control of the false discovery rate (FDR). However, knockoffs require rich generative models capable of accurately modeling the knockoff features while ensuring they obey the so-called swap property. We develop Deep Direct Likelihood Knockoffs (DDLK), which directly minimizes the KL divergence implied by the knockoff swap property. DDLK consists of two stages: it first maximizes the explicit likelihood of the features, then minimizes the KL divergence between the joint distribution of features and knockoffs and any swap between them. To ensure that the generated knockoffs are valid under any possible swap, DDLK uses the Gumbel-Softmax trick to optimize the knockoff generator under the worst-case swap. We find DDLK has higher power than baselines while controlling the false discovery rate on a variety of synthetic and real benchmarks including a task involving the largest COVID-19 health record dataset in the United States.
- North America > United States > Illinois (0.04)
- Asia > Middle East > Oman (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Information Technology (0.67)
- Health & Medicine > Therapeutic Area > Gastroenterology (0.46)
- North America > Canada (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Health & Medicine > Therapeutic Area > Immunology (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Deep Direct Likelihood Knockoffs
Sudarshan, Mukund, Tansey, Wesley, Ranganath, Rajesh
Predictive modeling often uses black box machine learning methods, such as deep neural networks, to achieve state-of-the-art performance. In scientific domains, the scientist often wishes to discover which features are actually important for making the predictions. These discoveries may lead to costly follow-up experiments and as such it is important that the error rate on discoveries is not too high. Model-X knockoffs enable important features to be discovered with control of the FDR. However, knockoffs require rich generative models capable of accurately modeling the knockoff features while ensuring they obey the so-called "swap" property. We develop Deep Direct Likelihood Knockoffs (DDLK), which directly minimizes the KL divergence implied by the knockoff swap property. DDLK consists of two stages: it first maximizes the explicit likelihood of the features, then minimizes the KL divergence between the joint distribution of features and knockoffs and any swap between them. To ensure that the generated knockoffs are valid under any possible swap, DDLK uses the Gumbel-Softmax trick to optimize the knockoff generator under the worst-case swap. We find DDLK has higher power than baselines while controlling the false discovery rate on a variety of synthetic and real benchmarks including a task involving a large dataset from one of the epicenters of COVID-19.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)